UNIVERSITY OF NEW ENGLAND 


University of NAME: 
New England 


STUDENT NUMBER: 
UNIT CODE: STAT200 
PAPER TITLE: Statistical Modeling for the Sciences II 
PAPER NUMBER: _ First and Only 
DATE: Thursday 5 June 2014 TIME: 1:45 PM TO 4:00 PM 


TIME ALLOWED: — Two (2) hours and fifteen minutes 


NUMBER OF PAGES IN PAPER: NINE (9) 
NUMBER OF QUESTIONS ON PAPER: FIVE (5) 
NUMBER OF QUESTIONS TO BE ANSWERED: FIVE(S) 


STATIONERY 6 PAGE ANSWER BOOKS oF GENERAL PURPOSE ANSWER SHEET 
PER 


CANDIDATE: oO GRAPH PAPER SHEETS oF GEOLOGY SAMPLES 


OTHER AIDS REQUIRED: NIL 


POCKET CALCULATORS PERMITTED: YES (APPROVED MODELS ONLY) 


TEXTBOOKS OR NOTES PERMITTED: THIS IS AN OPEN BOOK EXAMINATION. 
ANY HARDCOPY MATERIALS ARE PERMITTED. 
INSTRUCTIONS FOR CANDIDATES: 

e Candidates MAY NOT start writing until instructed to do so by the supervisor 


e Please pay attention to the announcements and read all instructions carefully before 
commencing the paper 


e Candidates MUST write their name and student number on the top of this page 
e Questions are NOT of equal value 

e TOTAL MARKS equals 54 

e Explain your answers, using correct notation where appropriate 


e This examination question paper MUST BE HANDED IN with worked scripts. Failure to do 
SO may result in the cancellation of all marks for this examination 


REMEMBER TO WRITE YOUR NAME AND STUDENT NUMBER AT THE TOP OF THIS PAGE 


THE UNIVERSITY CONSIDERS IMPROPER CONDUCT IN EXAMINATIONS TO BE A SERIOUS OFFENCE. 
PENALTIES FOR CHEATING ARE EXCLUSION FROM THE UNIVERSITY FOR ONE YEAR AND/OR CANCELLATION 
OF ANY CREDIT RECEIVED IN THE EXAMINATION FOR THAT UNIT. 


STAT 200 Trimester 1, 2014 


Question 1 [11 marks] 


(a) Assume that a two-way analysis of variance reveals that there is a significant 
interaction between the predictors. Explain why you should not interpret either of 


the main effects in isolation. [2 marks] | 


(b) Explain why a classical linear model would not be appropriate when the response is 


binary. [2 marks| 


(c) Consider two factors A & B, with two levels each. Table 1 records the responses. Is 
an interaction between A & B suggested by the data? Explain your response. 
[3 marks| 


(d) The design matrix and the vector of regression coefficients, corresponding to a model 


with one qualitative predictor variable with 3 levels, are given below. [4 marks] 


(i) What does the first regression coeflicient represent? 


(ii) What is the estimate for the mean of level 2? 


(iii) Which level is the baseline or reference level? 


level X 
1/1 1 0 3 
2/101 —] 
3 | 1 0 0 0 


STAT200 Trimester 1, 2014 


Question 2 | [7 marks] 
The number of successful (S) and unsuccessful (F) mating pairs of two different species 
(A. & B) of hawks in two different valleys were recorded (Table 2). The research question 
of interest is whether there is a difference in mating success rates between the two species. 


A model was fitted and the edited R output is given in ‘Table 3. 


(a) Is the model a good fit? Explain your response. [2 marks| 
(b) Explain why valley was fitted first in the model. [2 marks] 
(c) With reference to relevant conditional probabilities based on the observed values, 


give an informative conclusion. [3 marks] 


| ‘Table 2: 
Species Valley Result 


Response: count 

Df Deviance Resid. Df Resid. Dev Pr(>Chi) 
NULL 7 100. 
valley 0.0 LO0:. 1.0000 
species 0.5 99. 0.4794 
result Cul 0.0045 
valley: species 25.1 5.5¢e-07 


valley:result 55.2 1.1e-13 


species:result 10.8 0.0010 
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Question 3 [11 marks] 


A study of fruit flies was conducted to determine whether an increase in sexual activity 


would reduce the life spans of male fruit flies. One hundred and twenty five male flies were 


randomly allocated to one of five groups: 


1 
2 
3. 
4 
) 


(a) 


. elghtP: male assigned to live with 8 pregnant female fruit flies 


. eightV: male assigned to live with 8 virgin female fruit flies 


oneP: male assigned to live with 1 pregnant female fruit fly 


. oneV: male assigned to live with 1 virgin female fruit fly 


. zero: male assigned to live alone (zero females), 


Give contrasts (C1 ~ C3) that will answer the following questions: 
(i) Is there a difference in the mean longevity between the males in the zero group, 
and those in all other groups? [2 marks] 


(ii) Is there a difference in longevity between males assigned to live with virgin 


female flies and those assigned to live with pregnant female flies? [2 marks] 


(iii) Is there a difference in longevity between males assigned to live with eight 
pregnant female flies and those assigned to live with one pregnant female fly? 


[2 marks] 


Find a fourth contrast (C4) that will complete an orthogonal set of contrasts, and 


state the question it will address. [2 marks| 


Question 3 is continued on page 5 
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Question 3 continued 


(c) A model was fitted that incorporated the four orthogonal contrasts, and the output 
is given in Table 4. With reference to the contrasts, and the table of means, give an 


informative interpretation of the output. [3 marks] 


Table 4: 


Analysis of Variance Table 


Response: Longevity 

Df Sum Sq Mean Sq F value Pr(OF) 
C1 1 1170 1170 5.34 0.023 
1 6675 6675 30.44 2.0e-07 
1 26 — 26 0.12 0.732 
1 4068 4068 18.55 3.4e-05 
Residuals 120 26314 219 


> tapply(Longevity, Treatment, mean) 
eightp eightv onep oonev Zero 


63 39 65 Sf 64 
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Question 4 [13 marks] 
An experiment on wheat was laid out in a completely randomised design with three 
treatments (treat) and five plots per treatment. The variable of interest was yieldN, the 


crop yield (kg/plot). The yields from the same plots in the previous year (pre-treatment) 
were also recorded (X). 


(a) Interpret the xyplot below. [2 marks] 


yieldN 


(b) A stepwise regression was performed and the relevant R code and output is given in 
Table 5. Explain the process and output, and interpret the result. [3 marks] 
Table 5: 


## R code ## ## Output ## 

start .model<-lm(yieldN~X*treat) Start: AIC=29 

formL<-formula(~1) yieldN ~ X * treat 

f ormU<-formula(~X*treat) | | Df Sum of Sq RSS AIC 

mods<-step(start.model, - X:treat 2 1,09 53.3 2720 
direction="backward", <none> | 46.2 28.9 


scope=c (formL, formU) ) 


Step: AIC=27 
yieldN ~ X + treat 

Df Sum of Sq RSS AIC 
<none> 53.3 27.0 
- treat 2 102 154.8 39.0 


Question 4 is continued on page 7 
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Question 4 continued 


(c) With reference to the table of regression coefficients and their 957% confidence inter- 


vals (Table 6), answer the following questions. 


(i) Does the relationship between yieldN and X differ between treatments? Justify 
your response with reference to relevant output from both Tables 5 & 6. 
[4 marks| 


(ii) Write down the equation of the regression line corresponding to treatment 3. 
[2 marks] 


(iii) Showing all calculations determine the predicted yield (yieldN) for plots receiv- 
ing Treatment 3, with a previous year’s yield of 30kg. 
[2 marks] 


betaCI (mods) 

Estimate Std. Error 2.5 4% 97.5 
(Intercept) -16.3 6.5 -30.57 
X LZ 0.2 0.79 
treat2 1.9 det <“h269 
treat3 6.7 1.6 3.17 


|treat <- relevel(treat, ref="3") 
mods3<-1m(yieldN*X+treat , x=T) 
betaCI (mods3) 


Estimate Std. Error 2.5 % 
(Intercept) 20.56 | Sel 22205 
X Ven 0.2 0.79 
treatl =6:.-¢ 1.6 -10.29 
treat2 -4.8 1.4 -7.85 
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Question 5 [12 marks] 
An animal scientist conducted an experiment to study the effect of water quality on 
feedlot performance of steer calves. Four water quality treatments were used, with 2 
replicate pens of animals for each treatment combination. The experiment was conducted 
during two consecutive summers. The resulting design is a 2 x 4 factorial. The response is 


the average daily gains for each of the 16 pens. 


(a) Two models (mod.Ime and mod2.Ime) were fitted: 


mod.lme <- lme(gain~water, random=~1|summer/water) 


mod2.1lme <- lme(gain~water, random=~1|summer) 


(i) Which of the two factors (water or summer) is being treated as a random 
effect? | [1 marks] 


(ii) Explain why it is appropriate that the factor identified in (i) should be treated 


as random. [2 marks| 


(iii) Interpret the interaction plot below and indicate what the plot suggests as a 


likely model. [2 marks] 


2:2 3200-2 “2.5 “2:6 


mean of gain 


20» 2. 


summer 


Question 5 is continued on page 9 
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Question 5 continued 


(iv) Examine the model comparison below and interpret the (edited) output. 
[2 marks| 


anova(mod.lme,mod2.1me) 


Model df ATC BIG Test 
7 8.31 13.20 
10237 sles ire. 1 vs 2 


(b) The (edited) R output for the final model is given below. 


(i) Using correct notation, derive the variance components. [3 marks] 


(ii) What proportion of the total variance is attributable to residuals (unexplained 


variability)? [2 marks| 


Random effects: 

Formula: ~1 | summer 
(Intercept) 

stdDev: 0.176 


Formula: ~1 | water %in/ summer 


(Intercept) Residual 
stdDev: 0.152 OLE 


Please remember: This examination question paper MUST BE HANDED IN. Failure 


to do so may result in the cancellation of all marks for this examination. Writing your 


name and number on the front will help us confirm that your paper has been returned. 


